Tutorial 7 - Three-way Branch Migration Transitions

This example is similar to Tutorial 1, where we studied the opening and closing of a hairpin ('kissing stem-loop'). This time, we discuss how multistranded complexes are handled. Mainly, we think about what the exact and loose macrostate differences are -- what do they tell us?

Let's start with some basic imports:

In [1]:
#import os
#os.environ["NUPACKHOME"] = ".."    # use this if you notice NUPACKHOME errors

if False:  # only needed if you're having trouble with your Multistrand installation
    import multistrand_setup

try:
    from multistrand.objects import *
    from multistrand.options import Options
    from multistrand.system import SimSystem

except ImportError:
    print("Could not import Multistrand.")
    raise


import numpy as np

Macrostates, complexes, strands and domains.

As before, the macrostates definitions are used as stop conditions for our simulation:

In [2]:
# for StopCondition and Macrostate definitions:
Exact_Macrostate = 0   # match a secondary structure exactly (i.e. any system state that has a complex with this exact structure)
Bound_Macrostate = 1   # match any system state in which the given strand is bound to another strand
Dissoc_Macrostate = 2  # match any system state in which there exists a complex with exactly the given strands, in that order
Loose_Macrostate = 3   # match a secondary structure with "don't care"s, allowing a certain number of disagreements
Count_Macrostate = 4   # match a secondary structure, allowing a certain number of disagreements
# see Schaeffer's PhD thesis, chapter 7.2, for more information

In the following setup, we simulate a three-way branch migration. To study how the branch migration progresses, we define intermediate structures and measure how much time the simulation spents in each of these states. We also estimate the transition probability between each of these states. In the below, we define the initial structure, in addition to structures with six, twelve and eightteen displaced base-pairs.

We explore two competing approaches by defining exact and the loose macrostates. Note how in the dot-parenthesis notation, which is used to denote secondary structure for our nucleic acids, the asterix is used as a wildcard. In the loose macro states, only eight out of the seventy-two possible base-pairs are defined, and if any state includes at least six of those bounds, then it is in the set of matching states. In contrast, the exact macrostates have no wildcards and only when the simulation reaches that the exact state, a match is recorded.

In [3]:
def setup_options_threewaybm(toehold_seq = "GTGGGT", bm_design = "ACCGCACGTCACTCACCTCG"):

    # the structures below are hard-coded for these lengths
    assert len(toehold_seq)==6
    assert len(bm_design)==20

    toehold = Domain(name="toehold",sequence=toehold_seq,length=6)
    branch_migration = Domain(name="bm", sequence=bm_design, seq_length=20)
    
    substrate = toehold + branch_migration
    incumbent = Strand(name="incumbent",domains=[branch_migration.C])

    incoming = substrate.C

    # start with 6-base toehold fully bound
    start_complex = Complex(strands=[incoming, substrate, incumbent], structure=".(+)(+)")

    initial_structure_dp   = "....................((((((+))))))((((((((((((((((((((+))))))))))))))))))))"
    six_bases_structure_dp = "..............((((((((((((+))))))))))))((((((((((((((+))))))))))))))......"
    six_bases_loose_dp     = "**************((**********+**********))((************+************))******"
    twelve_bases_struc_dp  = "........((((((((((((((((((+))))))))))))))))))((((((((+))))))))............"
    twelve_bases_loose_dp  = "********((*****************+***************))((******+******))************"
    eighteen_structure_dp  = "..((((((((((((((((((((((((+))))))))))))))))))))))))((+)).................."
    eighteen_loose_dp      = "**((**********************+**********************))((+))******************"

    six_bases_complex           = Complex(strands=[incoming,substrate,incumbent], structure=six_bases_structure_dp)
    twelve_bases_complex        = Complex(strands=[incoming,substrate,incumbent], structure=twelve_bases_struc_dp)
    eighteen_bases_complex      = Complex(strands=[incoming,substrate,incumbent], structure=eighteen_structure_dp)
    six_basesloose_complex      = Complex(strands=[incoming,substrate,incumbent], structure=six_bases_loose_dp)
    twelve_basesloose_complex   = Complex(strands=[incoming,substrate,incumbent], structure=twelve_bases_loose_dp)
    eighteen_basesloose_complex = Complex(strands=[incoming,substrate,incumbent], structure=eighteen_loose_dp)

    disassoc_complex            = Complex(strands=[incumbent], structure=".")   # succesful strand displacement
    failed_complex              = Complex(strands=[incoming], structure="..")   # failed strand displacement attempt

    start_sc          = Macrostate("INITIAL", [(start_complex,Count_Macrostate,2)])                 # Within distance 2 of the start_complex state.
    six_sc_exact      = Macrostate("SIX_EXACT", [(six_bases_complex,Exact_Macrostate,0)])           # the third parameter is ignored; not needed for exact macrostates
    six_sc_loose      = Macrostate("SIX_LOOSE", [(six_basesloose_complex,Loose_Macrostate,2)])      # 8 base pairs defined; must have at least 6 to match.
    twelve_sc_exact   = Macrostate("TWELVE_EXACT", [(twelve_bases_complex,Exact_Macrostate,0)])
    twelve_sc_loose   = Macrostate("TWELVE_LOOSE", [(twelve_basesloose_complex,Loose_Macrostate,2)])
    eighteen_sc_exact = Macrostate("EIGHTEEN_EXACT", [(eighteen_bases_complex,Exact_Macrostate,0)])
    eighteen_sc_loose = Macrostate("EIGHTEEN_LOOSE", [(eighteen_basesloose_complex,Loose_Macrostate,2)])

    # why bother giving a list of just one macrostate-def tuple?  A Macrostate with a list of multiple tuples give the AND (i.e. intersection) of microstates.

    completed_sc      = StopCondition("stop:COMPLETE", [(disassoc_complex,Dissoc_Macrostate,0)])  # incumbent strand fell off  
    rejected_sc       = StopCondition("stop:REJECTED", [(failed_complex,Dissoc_Macrostate,0)])    # incoming strand fell off

    # join_concentration is not defined, because in this simulation we stop before there's any chance for association steps
    o_exact = Options(simulation_mode="Transition",parameter_type="Nupack", dangles="Some",
                substrate_type="DNA", num_simulations = 10, simulation_time=.01, temperature=310.15,
                start_state=[start_complex], rate_scaling='Calibrated', verbosity=0)

    o_exact.stop_conditions = [start_sc, six_sc_exact, twelve_sc_exact, eighteen_sc_exact, completed_sc, rejected_sc]
    
    o_loose = Options(simulation_mode="Transition",parameter_type="Nupack", dangles="Some",
                substrate_type="DNA", num_simulations = 10, simulation_time=.01, temperature=310.15,
                start_state=[start_complex], rate_scaling='Calibrated', verbosity=0)

    o_loose.stop_conditions = [start_sc, six_sc_loose, twelve_sc_loose, eighteen_sc_loose, completed_sc, rejected_sc]

    return o_exact, o_loose

Were you able to figure out the topology of the initial state, initial_structure_dp, in the above? It looks like this:

With this in mind, you'll be able to work out the other states, which are specified to have six, twelve and eighteen displaced base-pairs, respectively.

To keep track of the progress of the three-way branch migration, the following helper functions are defined:

In [4]:
# mol will be a list of True/False for which transition macrostates the system has entered
# so in_state(mol) returns True if the system is in at least one of the listed macrostates.
def in_state( mol ): return sum(mol) > 0

# mol is a Boolean descriptor of macrostate occupancy, like mol above.
# a short-hand name for this macrostate (based on the order given in stop_conditions) is provided.
def mol_name(mol):
    charindex = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0'
    names = [charindex[j] for i,j in zip(mol,range(len(mol))) if i]
    if names == []:
        names = charindex[26]
    else:
        names = ",".join(names)
    return names

# t0 and t1 are Boolean descriptors of macrostate occupancy, like mol above.
# here, we provide a printable name for the transition between two macrostate occupancy lists.
def trans_name(t0,t1):
    return mol_name(t0) + ' -> ' + mol_name(t1)

def print_transitions( transition_traj ):
    for t in transition_traj:
        print "%12g : %s" % ( t[0], mol_name(t[1]) )
In [5]:
# for each simulation, the transition trajectory reports the tuple (time_entered, which_macrostates_the_system_is_now_in)
def parse_transition_lists( transition_traj_list ):
    transition_dict = {}

    # the mol1 --> mol2 transition times represent (time of entry into mol1) to (time of entry into mol2)
    for transition_traj in transition_traj_list:
        truncated = [i for i in transition_traj if in_state(i[1])]
        tt = truncated # only keep the entry times and mol states for non-trivial mols

        for i in range(len(tt)-1):
            nm = trans_name(tt[i][1],tt[i+1][1])
            if nm in transition_dict:
                transition_dict[nm].append( tt[i+1][0] - tt[i][0] )
            else:
                transition_dict[nm] = [tt[i+1][0] - tt[i][0]]

    return transition_dict

def parse_transition_list( transition_traj_list ):
    return parse_transition_lists( [transition_traj_list] )

    
def print_transition_dict( transition_dict, options = None ):
    k = transition_dict.keys()
    k.sort() 

    for i in k:
        transition_times = np.array( transition_dict[i] )
        print("{0}: {2:.2e} ({1})".format(i,len(transition_dict[i]),np.mean(transition_times)))
    
    # also print the true names of the macrostates, if an Options object is provided
    charindex = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0'
    if options:
        for i,idx in zip(options.stop_conditions,range(len(options.stop_conditions))):
            print("{0}: {1}".format( i.tag, charindex[idx]))

We are now ready to start the simualtion.

In [6]:
# Could take up to <10s to run

print "--- Running simulations ---"
o_exact,o_loose = setup_options_threewaybm() 

# Try this too... The toehold dissociates much slower.
#o_exact,o_loose = setup_options_threewaybm(bm_design="ACCGCACGTCCACGGTGTCG") 

s=SimSystem(o_exact)
s.start()
print "--- Finished simulations ---"
--- Running simulations ---
--- Finished simulations ---

The following function plots the results for the exact macrostates. Internally, multistrand stores the time and macrostate whenever the simulation enters a macrostate. A transition between macrostates A and B is therefore registered if the simulation, currently in A, enters B before entering any other macrostate.

In [7]:
def print_results( o ):
    print
    print "--- Analysis of simulations by transitional states ---"
    # print "  Coarse-grained trajectory of simulation #1:"
    # print_transitions(o1.interface.transition_lists[0])
    print "  Transitions from simulation #1:"
    parsedlist = parse_transition_list(o.interface.transition_lists[0])
    print_transition_dict(parsedlist)
    print "  Transitions averaged over all %d simulations:" % o.num_simulations
    parsedlist = parse_transition_lists(o.interface.transition_lists)
    print_transition_dict(parsedlist,o) # adds names for macrostates

print_results(o_exact)
--- Analysis of simulations by transitional states ---
  Transitions from simulation #1:
A -> A: 9.99e-09 (354)
A -> B: 2.14e-07 (2)
B -> A: 2.20e-07 (1)
B -> B: 1.12e-09 (3)
B -> C: 1.68e-07 (1)
C -> C: 5.90e-08 (72)
C -> D: 1.19e-07 (1)
C -> E: 9.96e-07 (1)
D -> C: 3.67e-07 (1)
D -> D: 1.12e-08 (25)
  Transitions averaged over all 10 simulations:
A -> A: 8.84e-09 (32629)
A -> B: 2.53e-07 (40)
A -> C: 7.44e-07 (2)
A -> F: 5.64e-08 (1)
B -> A: 2.48e-07 (33)
B -> B: 4.57e-08 (710)
B -> C: 8.57e-07 (9)
C -> B: 1.31e-06 (2)
C -> C: 7.55e-08 (568)
C -> D: 1.85e-07 (13)
C -> E: 9.96e-07 (1)
D -> C: 1.60e-07 (5)
D -> D: 9.04e-09 (880)
D -> E: 5.16e-08 (8)
INITIAL: A
SIX_EXACT: B
TWELVE_EXACT: C
EIGHTEEN_EXACT: D
stop:COMPLETE: E
stop:REJECTED: F

Between state A, the initial state, and state E, indicating succesful migration, and state F, indicating the disocciation of the invading strand from the complex, can you work out in how many simulations the migration was completed?

We now run the same simulation, and report the time spent in the loose macrostates:

In [8]:
s=SimSystem(o_loose)
s.start()
print_results(o_loose)
--- Analysis of simulations by transitional states ---
  Transitions from simulation #1:
A -> A: 8.60e-09 (1288)
A -> B: 1.57e-07 (3)
B -> A: 1.72e-07 (2)
B -> B: 3.31e-08 (57)
B -> C: 1.36e-07 (1)
C -> C: 7.02e-08 (16)
C -> D: 5.01e-08 (1)
D -> D: 3.77e-08 (12)
D -> E: 4.35e-08 (1)
  Transitions averaged over all 10 simulations:
A -> A: 9.16e-09 (30124)
A -> B: 1.55e-07 (38)
B -> A: 1.59e-07 (28)
B -> B: 3.09e-08 (866)
B -> C: 7.90e-07 (17)
C -> B: 4.85e-07 (7)
C -> C: 6.33e-08 (400)
C -> D: 1.23e-07 (14)
D -> C: 1.69e-07 (4)
D -> D: 3.48e-08 (191)
D -> E: 1.89e-07 (10)
INITIAL: A
SIX_LOOSE: B
TWELVE_LOOSE: C
EIGHTEEN_LOOSE: D
stop:COMPLETE: E
stop:REJECTED: F

The lesson here is similar to Tutorial 1. When exact macrostates are tracked, not much information is obtained because the system does not spend much time in each exact macrostate. Using a more relaxed macrostate, we are able to track the migration process in greater detail.